Neurohex: A Deep Q-learning Hex Agent

نویسندگان

  • Kenny Young
  • Ryan B. Hayward
  • Gautham Vasan
چکیده

DeepMind’s recent spectacular success in using deep convolutional neural nets and machine learning to build superhuman level agents — e.g. for Atari games via deep Q-learning and for the game of Go via other deep Reinforcement Learning methods — raises many questions, including to what extent these methods will succeed in other domains. In this paper we consider DQL for the game of Hex: after supervised initializing, we use self-play to train NeuroHex, an 11-layer CNN that plays Hex on the 13×13 board. Hex is the classic two-player alternate-turn stone placement game played on a rhombus of hexagonal cells in which the winner is whomever connects their two opposing sides. Despite the large action and state space, our system trains a Q-network capable of strong play with no search. After two weeks of Q-learning, NeuroHex achieves respective win-rates of 20.4% as first player and 2.1% as second player against a 1-second/move version of MoHex, the current ICGA Olympiad Hex champion. Our data suggests further improvement might be possible with more training time. 1 Motivation, Introduction, Background 1.1 Motivation DeepMind’s recent spectacular success in using deep convolutional neural nets and machine learning to build superhuman level agents — e.g. for Atari games via deep Q-learning and for the game of Go via other deep Reinforcement Learning methods — raises many questions, including to what extent these methods will succeed in other domains. Motivated by this success, we explore whether DQL can work to build a strong network for the game of Hex. 1.2 The Game of Hex Hex is the classic two-player connection game played on an n×n rhombus of hexagonal cells. Each player is assigned two opposite sides of the board and a set of colored stones; in alternating turns, each player puts one of their stones on an empty cell; the winner is whomever joins their two sides with a contiguous chain of their stones. Draws are not possible (at most one player can have a winning chain, and if the game ends with the board full, then exactly one player will have such a chain), and for each n×n board there exists a winning strategy for the 1st player [7]. Solving — finding the win/loss value — arbitrary Hex positions is P-Space complete [11]. Despite its simple rules, Hex has deep tactics and strategy. Hex has served as a test bed for algorithms in artificial intelligence since Shannon and E.F. Moore built a resistance network to play the game [12]. Computers have solved all 9×9 1-move openings and two 10×10 1-move openings, and 11×11 and 13×13 Hex are games of the International Computer Games Association’s annual Computer Olympiad [8]. In this paper we consider Hex on the 13×13 board. (a) A Hex game in progress. Black wants to join top and bottom, White wants to join left and right. (b) A finished Hex game. Black wins. Fig. 1: The game of Hex. 1.3 Related Work The two works that inspire this paper are [10] and [13], both from Google DeepMind. [10] introduces Deep Q-learning with Experience Replay. Q-learning is a reinforcement learning (RL) algorithm that learns a mapping from states to action values by backing up action value estimates from subsequent states to improve those in previous states. In Deep Q-learning the mapping from states to action values is learned by a Deep Neural network. Experience replay extends standard Q-learning by storing agent experiences in a memory buffer and sampling from these experiences every time-step to perform updates. This algorithm achieved superhuman performance on several classic Atari games using only raw visual input. [13] introduces AlphaGo, a Go playing program that combines Monte Carlo tree search with convolutional neural networks: one guides the search (policy network), another evaluates position quality (value network). Deep reinforcement learning (RL) is used to train both the value and policy networks, which each take a representation of the gamestate as input. The policy network outputs a probability distribution over available moves indicating the likelihood of choosing each move. The value network outputs a single scalar value estimating

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Thinking Fast and Slow with Deep Learning and Tree Search

Sequential decision making problems, such as structured prediction, robotic control, and game playing, require a combination of planning policies and generalisation of those plans. In this paper, we present Expert Iteration (EXIT), a novel reinforcement learning algorithm which decomposes the problem into separate planning and generalisation tasks. Planning new policies is performed by tree sea...

متن کامل

CS229 Final Report Deep Q-Learning to Play Mario

In this paper, I study applying applying and adjusting DeepMind’s Atari Deep Q-Learning model to train an automatic agent to play the 1985 Nintendo game Super Mario Bros. The agent learns control policies from raw pixel data using deep reinforcement learning. The model is a convolutional neural network that trained through only raw frames of the game and basic info such as score and motion.

متن کامل

Deep Reinforcement Learning for Flappy Bird

Reinforcement learning is essential for applications where there is no single correct way to solve a problem. In this project, we show that deep reinforcement learning is very effective at learning how to play the game Flappy Bird, despite the high-dimensional sensory input. The agent is not given information about what the bird or pipes look like it must learn these representations and directl...

متن کامل

Multiagent cooperation and competition with deep reinforcement learning

Evolution of cooperation and competition can appear when multiple adaptive agents share a biological, social, or technological niche. In the present work we study how cooperation and competition emerge between autonomous agents that learn by reinforcement while using only their raw visual input as the state representation. In particular, we extend the Deep Q-Learning framework to multiagent env...

متن کامل

Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning

Many real-world problems, such as network packet routing and urban traffic control, are naturally modeled as multi-agent reinforcement learning (RL) problems. However, existing multi-agent RL methods typically scale poorly in the problem size. Therefore, a key challenge is to translate the success of deep learning on singleagent RL to the multi-agent setting. A key stumbling block is that indep...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1604.07097  شماره 

صفحات  -

تاریخ انتشار 2016